DGet! An open source deuteration calculator for mass spectrometry data

DGet! is an open-source analysis package written in Python for calculating the degree of deuterium enrichment in isotopically labelled molecules using mass spectrometric data. The nuclear properties of deuterium make it a valuable tracer in metabolic studies and an excellent contrast agent in nuclear spectroscopies. Determination of molecular deuteration levels is typically performed using mass spectrometry, however software options to perform these calculations are scarce. The in-house scripts and spreadsheets currently used rarely account for isotopic interferences from 13C or multi-isotopic elements that impact deuteration calculations. DGet! removes isotopic interferences using de-convolution and both the isotopological makeup and overall deuteration level can be accurately recovered. The software is available as command line and web applications that take a molecular formula and mass spectrometry data and output a graphical representation of the degree of deuteration as well as the distribution of partially deuterated analogues. These applications are designed to be easy to use and enable superior characterisation of deuterated molecules for users of all levels of expertise, without the limitations of techniques currently used by the majority of deuteration laboratories and researchers. Graphical Abstract Supplementary Information The online version contains supplementary material available at 10.1186/s13321-024-00828-x.

When performing H/D exchange reactions, the exchange of protons is typically non-specific; i.e. exchange of H to D is random and occurs universally across the molecule at each site to differing extents [27].
The variation in deuteration levels can be attributed to the steric availability of the exchangeable C-H positions, or to differing levels of stability at certain positions [27,28].Given the applications of these deuterated molecules, it is necessary to quantify both the overall and site-specific deuterium content to assess the degree of deuterium enrichment and to understand the distribution of isotopologues.This is especially important in cases where the deuterated analogues are used in quantitative analytical techniques or in neutron experiments as even slight hydrogen contamination will present non-negligible negative complications.
Typically, the quantification of overall deuteration is assessed using mass spectrometry (MS) by analysing the isotopic mass distribution (e.g. Figure 1b), whilst more robust techniques such as evaluation of residual 1 H nuclear magnetic resonance signals or interpreting isotopic effects on 13 C chemical shifts are required to determine site-specific deuteration [27,29,30].While there is an abundance of software for calculating hydrogen-deuterium exchange (HDX) data [31,32] for protein and peptide related studies, open-source tools for small molecule deuterium incorporation applications do not exist.The majority of deuteration laboratories still use inhouse scripts and spreadsheets to determine overall molecular deuteration; these existing techniques have several shortcomings such as not being able to remove the effects of interfering isotopes, produce a representation of the distribution of isotopologues, account for all possible adducts, or output a graphical representation of the deuteration fit.These shortcomings introduce difficulties in interpreting mass spectrometry data for complicated deuterated molecules containing a high number of carbon atoms, atoms with high numbers Here we provide a simple software tool for routinely analysing and determining the total deuterium enrichment and distribution of isotopologues using mass spectrometry: DGet!.The DGet! software package provides users with both command line and web applications capable of analysing MS data to evaluate deuterium content.

Implementation
The software for deuteration calculation was written in Python 3.11 using the publicly available "molmass" and "NumPy" libraries and is hosted on GitHub (https:// github.com/ djdt/ dget) and the Python Package Index.Documentation including an API reference is also available (https:// dget.readt hedocs.io/ en/ latest).A web application has been created for users not experienced with the terminal and can be accessed through the GitHub repository or hosted locally.
The general processing workflow for DGet! is shown in Fig. 2 and requires two inputs; the molecular formula of the fully deuterated molecule and a file containing MS m/z and signal intensity values.MS data is supplied to the software as a delimited text file with at least two columns, one each for m/z and signal intensities.Passing these two options supports high-resolution MS (HRMS) spectra exports for the majority of vendors and options can usually be inferred from the file itself.A small module for importing Shimadzu HRMS data is included due to the different structure of the output of MS spectra.
If the adduct formed during ionisation is known then it can also be passed to the program in the form '[M + X] y ' , otherwise the adduct is selected from a list of those commonly observed in electrospray ionisation [33].In this case, the adduct with monoisotopic mass closest to that of the most intense MS peak within the expected deuteration spectrum is automatically chosen.This automatic selection can provide a good starting point for calculations but the chosen adduct should be verified by the user.
Calculation of the deuteration level is performed by first determining the isotopic spectra of the target MS species from the input formula and adduct.This spectrum is used as a response function h to de-convolve the individual contribution of each isotopologue from the mass spectrum.Target m/z values are created by combining the spectra of every possible isotopologue and then used to extract signals from the mass spectrum.Extraction of peak heights (or peak areas depending on the user's preference) is performed by either selecting the maximum data point within the target region or by integration.By default, the target regions are at each target m/z value ± 0.5 Da.Probabilities for each isotopologue y are then calculated by de-convolving the extracted signals x with the isotopic spectra in the frequency domain (Eq.1).This is shown in Fig. 3, where the individual contributions of deuterium isotopolouges are extracted and displayed.
The percent deuteration is then calculated using Eq. 2, where N is the deuterium count and y i is the normalised value of y for each deuteration state, i .The degree of deuteration is determined as the percent of the expected deuterium content taken from the number of deuterium atoms in the inputted molecular formular, not the overall H/D ratio.
(1) In molecules with many deuterium atoms the full spectra (from zero to N deuterium) will cover a large mass range.This exposes the calculation to interferences iy i from contaminants or other, lower mass, adducts.To prevent these interferences, the calculation is cutoff at the lowest deuterium isotopologue where: there are two consecutive isotopologues below 1% probability and the total probability of all valid isotopologues is greater Mass spectra for NDF compounds with published isotopologue makeups were processed using DGet! and compared.In general, there was good agreement between the software and publications than 10%.The cutoff can also be manually assigned a deuterium count or m/z.Issues with incorrect mass calibration of the mass spectrometer can be corrected with ease; in this case DGet! will apply a m/z offset (up to a maximum of ± 0.5 Da) to the input data to align it with the predicted spectra.
A web application for DGet! is provided in the form of a Flask module; when using the web application, a report is automatically generated which contains useful information such as overall deuteration level, distribution of partially deuterated isotopologues, and a graphical plot showing the experimental mass spectra overlayed with the predicted isotopic spectra of the target molecule (Additional file 1: Figure S9).An Illustrative guide to the usage of the web application for DGet! has been included in Additional file 1.

Results and discussion
Compound test data was acquired from the Australian National Deuteration Facility (NDF).Each compound was provided with a molecular formula, mass spectral data and previously determined deuteration.Only compounds with clear mass spectra and an existing publication stating the percent deuteration were included.The 20 compounds are listed in Table S1 (Additional file 1) and their spectra are available in Additional file 2.
Compounds from the NDF were analysed using DGet! to compare the calculated percentage deuteration for a representation of commonly available commercial deuterated small molecules in the literature (See additional file 1) from a wide range of classes, molecular weights and deuterium content.The average absolute difference in the reported and calculated deuteration levels for the 20 compared compounds was 0.4 ± 0.3%.These errors demonstrate the ability of DGet! to produce deuterium levels that are in excellent agreement with those previously generated using manual methods.The isotopic distribution was available for 14 compounds and compared to those retrieved using the software (Fig. 4).The average absolute error for the recovery of individual isotopologues was 2 ± 3% when using DGet!It is not clear if this represents an error or improvement over traditional methods as the methods for calculation of these states, including if isotopic interferences from 13 C and other isotopes were accounted for, was not always available.
For molecules with complicated isotopic spectra, de-convolution improves the accuracy of both individual deuterium isotopologue and overall deuteration percentage.To demonstrate this hydrogen isotope Fig. 5 Recovery of isotopologues with and without de-convolution.The recovery of isotopologues for C 6 D 14 is successful even without de-convolution.However, there is a large overestimation of highly deuterated isotopologues for C 6 D 13 Br when the additional Br isotopes are not compensated for exchange simulations (using Additional File 3) were performed on C 6 D 14 and C 6 D 13 Br, with a rate ratio of 2.18 and to a final overall deuteration of 60% [34].In Fig. 5, the isotopic distribution due to 13 C isotopes has a small impact on the calculation of overall deuteration and both raw peak heights and de-convolution gave accurate results (60%) for C 6 D 14 and recovery of the individual isotopologues was within 0.1%.However, once a more complicated spectra is introduced in C 6 D 13 Br, only the de-convolution is accurate.The 79 Br and 81 Br isotopes overlap with the deuterium spectra, shifting the apparent contributions of isotopologues and giving an apparent deuteration of 65% without de-convolution.This inaccuracy is apparent in the recovery of individual isotopologues, with a 4% error in the recovery of D 13 when de-convolution is not used, more than double the true value.

Conclusion
An open-source Python package was developed to address the limitations of existing deuteration calculation techniques.DGet! provides users with a simple interface in the form of command line and graphical web applications to quantify the degree of deuterium incorporation in deuterium enriched molecules using mass spectrometry data.Re-analysis of data using the new technique produced an equivalent deuterium content and distribution of isotopologues for previously reported compounds.The improved characterisation of molecules with multi-isotopic elements was demonstrated using simulations of C 6 D 13 Br and the analysis of C 20 D 28 O 8 Zr.The ease-of-use, configurability, and ability to be deployed anywhere will facilitate the adoption of the software into established deuterium enrichment calculation workflows, improving the interpretation of MS data, and advancing the field of deuteration chemistry.

Fig. 1
Fig. 1 Mass spectra for undeuterated and deuterated cholesterol.Biosynthetic cholesterol obtained from Saccharomyces sp.grown in protium oxide [C 27 H 46 O] a and deuterium oxide [C 27 HD 45 O] b medium and their associated high-resolution mass spectra

Fig. 2
Fig.2DGet!processing flow.Inputs of a formula and adduct are used to generate the isotopic spectra of the compound and target masses for all isotopologues.Peak heights or areas are extracted from the MS data at these target masses and the isotopologue makeup extracted by de-convolving, using the isotope spectra as a response function.Deuteration percentage and the makeup are then displayed to the user

Fig. 3 Fig. 4
Fig. 3 Mass and isotopologue spectra for C 20 D 28 O 8 Zr.The experimental mass spectra A and calculated isotopic spectra of deuterated C 20 D 28 O 8 Zr B, observed as the adduct [C 15 D 21 O 6 Zr + CH 3 CN] + .The isotopic spectrum is used as a point-spread-function to de-convolve the individual contribution of each isotopologue from the overlay mass spectra C.These contributions (D15-19) are then used to calculate the percentage deuteration